Lossless text compression using GPT-2 language model and Huffman coding
نویسندگان
چکیده
Modern daily life activities produced lots of information for the advancement telecommunication. It is a challenging issue to store them on digital device or transmit it over Internet, leading necessity data compression. Thus, research compression solve has become topic great interest researchers. Moreover, size compressed generally smaller than its original. As result, saves storage and increases transmission speed. In this article, we propose text technique using GPT-2 language model Huffman coding. proposed method, Burrows-Wheeler transform list keys are used reduce original file’s length. Finally, apply mode then coding encoding. This method compared with state-of-the-art techniques show that demonstrates gain in ratio other methods.
منابع مشابه
Lossless Image Compression and Decompression Using Huffman Coding
This paper propose a novel Image compression based on the Huffman encoding and decoding technique. Image files contain some redundant and inappropriate information. Image compression addresses the problem of reducing the amount of data required to represent an image. Huffman encoding and decoding is very easy to implement and it reduce the complexity of memory. Major goal of this paper is to pr...
متن کاملThe Novel Lossless Text Compression Technique Using Ambigram Logic and Huffman Coding
The new era of networking is looking forward to improved and effective methods in channel utilization. There are many texts where lossless data recovery is vitally essential because of the importance of information it holds. Therefore, a lossless decomposition algorithm which is independent of the nature and pattern of text is today's top concern. Efficiency of algorithms used today varies grea...
متن کاملExtending Huffman Coding for Multilingual Text Compression
Traditional text compression algorithms such as Huffman and LZ variants are usually based on 8-bit characters sampling. However, under the unicode representation for multilingual information, the character set of each language such as Chinese and Japanese is consisted of a very number of distinct characters and thus 16-bit or 32-bit character sampling is needed. Consequently, when text compress...
متن کاملNotes on Lossless Data Compression and Huffman Coding
It is convenient to minimize the space needed to store data. Larger files will take longer to transfer over a data link, and will more quickly fill up disk quotas. Data compression techniques such as those used in common compression utilities allow reducing file sizes by exploiting redundancies in the data contained in them. Lossless coding techniques do this without compromising any informatio...
متن کاملLossless Grey-scale Image Compression using Source Symbols Reduction and Huffman Coding
Usage of Image has been increasing and used in many applications. Image compression plays vital role in saving storage space and saving time while sending images over network. A new compression technique proposed to achieve more compression ratio by reducing number of source symbols. The source symbols are reduced by applying source symbols reduction and further the Huffman coding is applied to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SHS web of conferences
سال: 2021
ISSN: ['2261-2424', '2416-5182']
DOI: https://doi.org/10.1051/shsconf/202110204013